Search CORE

59 research outputs found

Fast Convergence of Belief Propagation to Global Optima: Beyond Correlation Decay

Author: Koehler Frederic
Publication venue
Publication date: 23/05/2019
Field of study

Belief propagation is a fundamental message-passing algorithm for probabilistic reasoning and inference in graphical models. While it is known to be exact on trees, in most applications belief propagation is run on graphs with cycles. Understanding the behavior of "loopy" belief propagation has been a major challenge for researchers in machine learning, and positive convergence results for BP are known under strong assumptions which imply the underlying graphical model exhibits decay of correlations. We show that under a natural initialization, BP converges quickly to the global optimum of the Bethe free energy for Ising models on arbitrary graphs, as long as the Ising model is \emph{ferromagnetic} (i.e. neighbors prefer to be aligned). This holds even though such models can exhibit long range correlations and may have multiple suboptimal BP fixed points. We also show an analogous result for iterating the (naive) mean-field equations; perhaps surprisingly, both results are dimension-free in the sense that a constant number of iterations already provides a good estimate to the Bethe/mean-field free energy.Comment: 24 pages; comments welcome

arXiv.org e-Print Archive

Information Theoretic Properties of Markov Random Fields, and their Algorithmic Applications

Author: Hamilton Linus
Koehler Frederic
Moitra Ankur
Publication venue
Publication date: 31/05/2017
Field of study

Markov random fields area popular model for high-dimensional probability distributions. Over the years, many mathematical, statistical and algorithmic problems on them have been studied. Until recently, the only known algorithms for provably learning them relied on exhaustive search, correlation decay or various incoherence assumptions. Bresler gave an algorithm for learning general Ising models on bounded degree graphs. His approach was based on a structural result about mutual information in Ising models. Here we take a more conceptual approach to proving lower bounds on the mutual information through setting up an appropriate zero-sum game. Our proof generalizes well beyond Ising models, to arbitrary Markov random fields with higher order interactions. As an application, we obtain algorithms for learning Markov random fields on bounded degree graphs on

n

nodes with

r

-order interactions in

n^r

time and

\log n

sample complexity. The sample complexity is information theoretically optimal up to the dependence on the maximum degree. The running time is nearly optimal under standard conjectures about the hardness of learning parity with noise.Comment: 25 page

arXiv.org e-Print Archive

The Vertex Sample Complexity of Free Energy is Polynomial

Author: Jain Vishesh
Koehler Frederic
Mossel Elchanan
Publication venue
Publication date: 23/02/2018
Field of study

We study the following question: given a massive Markov random field on

n

nodes, can a small sample from it provide a rough approximation to the free energy

\mathcal{F}_n = \log{Z_n}

? Results in graph limit literature by Borgs, Chayes, Lov\'asz, S\'os, and Vesztergombi show that for Ising models on

n

nodes and interactions of strength

\Theta(1/n)

, an

\epsilon

approximation to

\log Z_n / n

can be achieved by sampling a randomly induced model on

2^{O(1/\epsilon^2)}

nodes. We show that the sampling complexity of this problem is {\em polynomial in}

1/\epsilon

. We further show a polynomial dependence on

\epsilon

cannot be avoided. Our results are very general as they apply to higher order Markov random fields. For Markov random fields of order

r

, we obtain an algorithm that achieves

\epsilon

approximation using a number of samples polynomial in

r

and

1/\epsilon

and running time that is

2^{O(1/\epsilon^2)}

up to polynomial factors in

r

and

\epsilon

. For ferromagnetic Ising models, the running time is polynomial in

1/\epsilon

. Our results are intimately connected to recent research on the regularity lemma and property testing, where the interest is in finding which properties can tested within

\epsilon

error in time polynomial in

1/\epsilon

. In particular, our proofs build on results from a recent work by Alon, de la Vega, Kannan and Karpinski, who also introduced the notion of polynomial vertex sample complexity. Another critical ingredient of the proof is an effective bound by the authors of the paper relating the variational free energy and the free energy.Comment: arXiv admin note: text overlap with arXiv:1802.06126 Updated bibliograph

arXiv.org e-Print Archive

Approximating Partition Functions in Constant Time

Author: Jain Vishesh
Koehler Frederic
Mossel Elchanan
Publication venue
Publication date: 20/02/2018
Field of study

We study approximations of the partition function of dense graphical models. Partition functions of graphical models play a fundamental role is statistical physics, in statistics and in machine learning. Two of the main methods for approximating the partition function are Markov Chain Monte Carlo and Variational Methods. An impressive body of work in mathematics, physics and theoretical computer science provides conditions under which Markov Chain Monte Carlo methods converge in polynomial time. These methods often lead to polynomial time approximation algorithms for the partition function in cases where the underlying model exhibits correlation decay. There are very few theoretical guarantees for the performance of variational methods. One exception is recent results by Risteski (2016) who considered dense graphical models and showed that using variational methods, it is possible to find an

O(\epsilon n)

additive approximation to the log partition function in time

n^{O(1/\epsilon^2)}

even in a regime where correlation decay does not hold. We show that under essentially the same conditions, an

O(\epsilon n)

additive approximation of the log partition function can be found in constant time, independent of

n

. In particular, our results cover dense Ising and Potts models as well as dense graphical models with

k

-wise interaction. They also apply for low threshold rank models.Comment: This preprint is completely subsumed by preprints arXiv:1802.06126 and arXiv:1802.06129 by the same authors which also include important references that are missing in the current preprin

arXiv.org e-Print Archive

Mean-field approximation, convex hierarchies, and the optimality of correlation rounding: a unified perspective

Author: Jain Vishesh
Koehler Frederic
Risteski Andrej
Publication venue
Publication date: 28/08/2018
Field of study

The free energy is a key quantity of interest in Ising models, but unfortunately, computing it in general is computationally intractable. Two popular (variational) approximation schemes for estimating the free energy of general Ising models (in particular, even in regimes where correlation decay does not hold) are: (i) the mean-field approximation with roots in statistical physics, which estimates the free energy from below, and (ii) hierarchies of convex relaxations with roots in theoretical computer science, which estimate the free energy from above. We show, surprisingly, that the tight regime for both methods to compute the free energy to leading order is identical. More precisely, we show that the mean-field approximation is within

O((n\|J\|_{F})^{2/3})

of the free energy, where

\|J\|_F

denotes the Frobenius norm of the interaction matrix of the Ising model. This simultaneously subsumes both the breakthrough work of Basak and Mukherjee, who showed the tight result that the mean-field approximation is within

o(n)

whenever

\|J\|_{F} = o(\sqrt{n})

, as well as the work of Jain, Koehler, and Mossel, who gave the previously best known non-asymptotic bound of

O((n\|J\|_{F})^{2/3}\log^{1/3}(n\|J\|_{F}))

. We give a simple, algorithmic proof of this result using a convex relaxation proposed by Risteski based on the Sherali-Adams hierarchy, automatically giving sub-exponential time approximation schemes for the free energy in this entire regime. Our algorithmic result is tight under Gap-ETH. We furthermore combine our techniques with spin glass theory to prove (in a strong sense) the optimality of correlation rounding, refuting a recent conjecture of Allen, O'Donnell, and Zhou. Finally, we give the tight generalization of all of these results to

k

-MRFs, capturing as a special case previous work on approximating MAX-

k

-CSP.Comment: This version: minor formatting changes, added grant acknowledgement

arXiv.org e-Print Archive

Learning Some Popular Gaussian Graphical Models without Condition Number Bounds

Author: Kelner Jonathan
Koehler Frederic
Meka Raghu
Moitra Ankur
Publication venue
Publication date: 08/03/2020
Field of study

Gaussian Graphical Models (GGMs) have wide-ranging applications in machine learning and the natural and social sciences. In most of the settings in which they are applied, the number of observed samples is much smaller than the dimension and they are assumed to be sparse. While there are a variety of algorithms (e.g. Graphical Lasso, CLIME) that provably recover the graph structure with a logarithmic number of samples, they assume various conditions that require the precision matrix to be in some sense well-conditioned. Here we give the first polynomial-time algorithms for learning attractive GGMs and walk-summable GGMs with a logarithmic number of samples without any such assumptions. In particular, our algorithms can tolerate strong dependencies among the variables. Our result for structure recovery in walk-summable GGMs is derived from a more general result for efficient sparse linear regression in walk-summable models without any norm dependencies. We complement our results with experiments showing that many existing algorithms fail even in some simple settings where there are long dependency chains, whereas ours do not.Comment: V2: Updated version with some new result

arXiv.org e-Print Archive

Accuracy-Memory Tradeoffs and Phase Transitions in Belief Propagation

Author: Jain Vishesh
Koehler Frederic
Liu Jingbo
Mossel Elchanan
Publication venue
Publication date: 24/05/2019
Field of study

The analysis of Belief Propagation and other algorithms for the {\em reconstruction problem} plays a key role in the analysis of community detection in inference on graphs, phylogenetic reconstruction in bioinformatics, and the cavity method in statistical physics. We prove a conjecture of Evans, Kenyon, Peres, and Schulman (2000) which states that any bounded memory message passing algorithm is statistically much weaker than Belief Propagation for the reconstruction problem. More formally, any recursive algorithm with bounded memory for the reconstruction problem on the trees with the binary symmetric channel has a phase transition strictly below the Belief Propagation threshold, also known as the Kesten-Stigum bound. The proof combines in novel fashion tools from recursive reconstruction, information theory, and optimal transport, and also establishes an asymptotic normality result for BP and other message-passing algorithms near the critical threshold.Comment: To be presented on COLT 201

arXiv.org e-Print Archive

Representational Power of ReLU Networks and Polynomial Kernels: Beyond Worst-Case Analysis

Author: Koehler Frederic
Risteski Andrej
Publication venue
Publication date: 29/05/2018
Field of study

There has been a large amount of interest, both in the past and particularly recently, into the power of different families of universal approximators, e.g. ReLU networks, polynomials, rational functions. However, current research has focused almost exclusively on understanding this problem in a worst-case setting, e.g. bounding the error of the best infinity-norm approximation in a box. In this setting a high-degree polynomial is required to even approximate a single ReLU. However, in real applications with high dimensional data we expect it is only important to approximate the desired function well on certain relevant parts of its domain. With this motivation, we analyze the ability of neural networks and polynomial kernels of bounded degree to achieve good statistical performance on a simple, natural inference problem with sparse latent structure. We give almost-tight bounds on the performance of both neural networks and low degree polynomials for this problem. Our bounds for polynomials involve new techniques which may be of independent interest and show major qualitative differences with what is known in the worst-case setting

arXiv.org e-Print Archive

A Phase Transition in Arrow's Theorem

Author: Koehler Frederic
Mossel Elchanan
Publication venue
Publication date: 21/07/2020
Field of study

Arrow's Theorem concerns a fundamental problem in social choice theory: given the individual preferences of members of a group, how can they be aggregated to form rational group preferences? Arrow showed that in an election between three or more candidates, there are situations where any voting rule satisfying a small list of natural "fairness" axioms must produce an apparently irrational intransitive outcome. Furthermore, quantitative versions of Arrow's Theorem in the literature show that when voters choose rankings in an i.i.d.\ fashion, the outcome is intransitive with non-negligible probability. It is natural to ask if such a quantitative version of Arrow's Theorem holds for non-i.i.d.\ models. To answer this question, we study Arrow's Theorem under a natural non-i.i.d.\ model of voters inspired by canonical models in statistical physics; indeed, a version of this model was previously introduced by Raffaelli and Marsili in the physics literature. This model has a parameter, temperature, that prescribes the correlation between different voters. We show that the behavior of Arrow's Theorem in this model undergoes a striking phase transition: in the entire high temperature regime of the model, a Quantitative Arrow's Theorem holds showing that the probability of paradox for any voting rule satisfying the axioms is non-negligible; this is tight because the probability of paradox under pairwise majority goes to zero when approaching the critical temperature, and becomes exponentially small in the number of voters beyond it. We prove this occurs in another natural model of correlated voters and conjecture this phenomena is quite general.Comment: 48 pages; comments welcome

arXiv.org e-Print Archive

A Spectral Condition for Spectral Gap: Fast Mixing in High-Temperature Ising Models

Author: Eldan Ronen
Koehler Frederic
Zeitouni Ofer
Publication venue
Publication date: 16/07/2020
Field of study

We prove that Ising models on the hypercube with general quadratic interactions satisfy a Poincar\'{e} inequality with respect to the natural Dirichlet form corresponding to Glauber dynamics, as soon as the operator norm of the interaction matrix is smaller than

1

. The inequality implies a control on the mixing time of the Glauber dynamics. Our techniques rely on a localization procedure which establishes a structural result, stating that Ising measures may be decomposed into a mixture of measures with quadratic potentials of rank one, and provides a framework for proving concentration bounds for high temperature Ising models.Comment: Preliminary versio

arXiv.org e-Print Archive